home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SPACE 2
/
SPACE - Library 2 - Volume 1.iso
/
utility
/
533
/
kwic
/
read_me.txt
< prev
next >
Wrap
Text File
|
1991-07-02
|
11KB
|
224 lines
Documentation for KWIC.PRG
Key Words in Context listings have been in use for
several years by technical libraries.
Essentially, They provide an abstracting service
where no provider of such a service exists. Key
words are extracted from the title of a book or
journal article, sorted, and the entire title is
printed in its' original context; that is, the
full title of the work. A key word is simply any
word that is not on a list of words to be
excluded. Excluded words include the articles and
most prepositions and conjunctions. Thus, a
person doing research on migraine headaches looks
under 'migraine', 'headache', any other synonyms
he knows and under the drugs commonly prescribed
for this terrible malady. Some of the titles will
appear more than once during his search, of
course, but once he finishes he knows that he has
exhausted the information content of the set of
titles. The program can be used directly from the
monitor instead of a hard copy, but the listing
offers portability. Also I find the presence of
the computer a distraction, one gets fascinated
with the research tool instead of the problem
being researched. (How many ex chemists,
biologists, astronomers, ...., are now computer
programmers?)
Getting started
If you have a reasonably standard printer that is
powered up and ready to print, you can
demonstrate the program simply by executing it and
giving it the default response (a carriage return)
every time a question is asked. Sample files are
included to be used as a base for a built in
demonstration.
How I use this program
I collect cookbooks. I have no data base
program to assist me. There are several sub
categories to the collection. That is paper backs,
unusually prolific authors, books that are
encyclopedic in nature, books too big for
ordinary shelves, and so on. The sample file
included here is one of the sub collections. The
main file of rather ordinary hard bound books is
too large to include in an upload. The main file
starts with the book title in column 1 and has the
authors' name starting in column 40. (Note that
the sample file included here is a _sub_
collection and does not have the author's name
field.) The file is sorted by author name and
the books are shelved in that order too. Sorting
is done with a public domain sort program. Each
book is represented by a one line entry. This is
necessary because the sort program I use demands a
_consistent_ record length. That is, a record is
zero or more characters followed by a CR and LF.
To make a KWIC listing, I specify the particular
file of interest when 'input file' is called for.
I often make composite listings including several
sub-categories. To make a composite file, I find
the Public Domain program PCOMMA (aka PCOMMAND),
which emulates PC-DOS invaluable. I have several
.BATch files which, when run, join up the individual
files in various ways and produce the desired
composite file.
When the file of 'bad words' is called for I use a
personalized file. Some words are used so often
_within_ a specialty that they become simply
'noise words'. In cookbooks, such words as
'cook', book' 'cookbook', and 'recipe' come up so
often as to be meaningless. When the program asks
for columns to be ignored, I specify column 39.
This means that the authors' name is not a
keyword. After all, I already have a listing
sorted by authors' name. I also sometimes have
notes beyond column 39 and I want them ignored
too, as far as key words are concerned.
When KWIC asks for the leftmost column for the
keyword, I specify column 60. I then specify an
Epson printer, with printing to be 137 columns
wide. So I end up with a hard copy with key words
nicely aligned on column 60 and the authors' name
is on the same line (but not aligned properly,
unfortunately) so I can find the book physically
without referring to another index. That's how I
use it. Now on to the general nature of the beast.
The Input file
The input file is prepared with your favorite text
editor or a word processor in ASCII mode. It is a
list of book titles, journal articles, or any
analogous item. An entry normally starts in
column 1 and can be as long as desired, within
reason. The program will work best however, with
relatively short entries, say 80 characters or
less. Normal practice will result in most key
words having the initial letter in upper case.
The program will find them regardless of
upper/lower problems. But after they are found
they are sorted following the collating order of
ASCII. That means that 'a' follows 'Z'. Numbers
will be found as key words too. Sorting puts all
digits ahead of all letters. Blank lines in the
input file will be ignored.
The bad word file
You can make your own personalized bad word file
by modifying the file included with the .ARC. It
is a simple text file, too. The word must start
in column 1 and be followed _immediately_ by
[Return]. That is, 'apple' is not the same as
'apple '. These words should all be lower case.
You can enter the words in any order that occurs
to you; the program will automatically do a simple
resort of the bad word file every time it runs.
You can have as many bad word files as you wish.
The file included is specialized for cookbooks.
About the first 80 entries would apply to any English
title, simply remove the specialized words and
replace them with your own set.
Program Output
After the program has run and extracted all the
key words and sorted them, it is ready to provide
output. Since the program may run for several
minutes, it allows you to get several
outputs from a single run of the program. The
monitor choice is mostly offered as a preview to
get an idea of whether things turned out OK.
Since it is limited to 80 columns width, it is not
very effective for long records. The basic output
will often be an Epson printer with the 137
column line choice. This permits you to align the
key words at, say column 60 and get a nice looking
output with reasonably sized titles.
Non-Epson printers
If you have a non Epson printer, there are two
alternatives. The first alternative is to set up
the printer to produce some kind of compressed
printing _before_ you run KWIC. KWIC will not
send anything except data to the printer if you
specify non Epson. You can also use this approach
if you want more than 137 columns on an Epson
printer, the printer can easily go to 160 columns
and can even be pushed to exceed that.
The other alternative is to specify output to a
file. This will be an ordinary ASCII file which
you can read into a text editor, perhaps do
further editing, and output the same way you would
any other text file. One word about writing to a
file. The program uses the default Personal
Pascal text file write and it performs an
incredible amount of slow activity on the disk.
If you have a nervous temperament, as I do, and
you see hundreds of writes to your disk, you may
get very tense. The program works fine, but if
this bothers you, write to a blank floppy disk
(making things even slower!) and then copy that
file to your hard disk. I could have speeded this
write up, but considering the nature of the
program, it just didn't seem worth the effort.
Note also, that the file produced can easily be quite
large, one that I commonly produce is in excess of
200,000 bytes.
Loose ends
The program allows input files to be up to 110,000
bytes long and to have up to 8,000 key words. Normal
printing would produce up to a 140 page listing.
One of these sizes may be too small for your
situation, or the ratio (the number of key words
per title) may be wrong for you. These numbers
were chosen to allow the program to run in a
system that has about 250K bytes of free RAM. If
you want a customized version, send me E-Mail on
GEnie and I can probably make a special version
for you to fit your needs.
The program doesn't care what file names or file
name extensions you use; the names provided on the
file selectors are merely suggestions.
For those interested in Pascal, note that the sort
program included can be used as a debugged
Quicksort. To customize it, simply change the
type declarations and the SWAP procedure. The
base routine is fast, it is so slow as used here
because it uses Pascal string logic to copare two
11 character strings. This could easily be
speeded up, but considering the nature of this
program, it didn't seem worthwhile. The procedure
that reads a file of arbitrary length into an
ARRAY of characters might also be useful, it seems
that so many programs start out (or should start
out) by doing just that.
This program may be freely copied, uploaded, and
propogated by any suitable means as long as the
content passed on includes _all_ the files contained
in the original ARChive. Additionally, the name
should remain KWIC.ARC unless the target system
already has that name in use. Placed in public domain
July 1991.
Merlin Hanson
GEnie address: M.L.HANSON